China's Stock Market: A Story of Regulation, Leverage, and Premium

Data Bootcamp UG Fall 2016

Written by Xingyan Li, NYU Stern Class of 2017

Data Background

China’s capital system is undergoing rapid development, and its equity market size measured by total market capitalization has been No.2 since 2009. As of May 2015, the market capitalization of domestic issues on the booming Shanghai (SSE) and Shenzhen (SZE) exceeded $10 trillion and surpassed 14 trillion if including Hong Kong (HKEX), highlighting the extraordinary spread of market-based finance in a country led for more than 65 years by its communist party. However, China experienced a period of stock market turbulence in the summer of 2015 worsened by economic weakness, financial panic, and the policy response to these problems.

Mainland China’s equity markets differ from its Hong Kong counterpart due to capital controls under the communist party. Despite the recent launch of Shanghai-Hong Kong Stock Connect that lowers the cost of cross-border transactions, investors in each market still have limited access to shares listed on the other market due to investor eligibility, stock selections, and investment quotas.

Cross market indices such as the Hang Seng China AH Premium Index and Hang Seng China 50 Index are hybrid inventions under China's current political and capital structure, capturing investment opportunities created by exposure to a comprehensive China investment universe (Mainland-listed A and B shares, Hong Kong-listed H shares, Red Chips and shares of other Mainland companies).

My project is primarily focused on Hang Seng China AH Premium Index, which tracks the average price difference of A shares over H shares for the largest and most liquid Chinese companies with both A-share and H-share listings (“AH Companies”). This index shows a common premium of A share prices relative to shares of the same firms that trade in Hong Kong (H shares), and it is worth investigating the price discrepancy especially under the stress test of China's market crash in summer 2015.

Data Dictionary

"A" Shares: shares of Chinese companies listed in Mainland China, traded in local currency Chinese Yuan

"H" Shares: shares of Chinese companies listed in Hong Kong, traded in Hong Kong dollars

"B" shares: available to foreigners listed in U.S. dollars in Shanghai and Hong Kong dollars in Shenzhen, but fewer and fewer investors are following this market and it may phase out

Abstract

I will first analyze market trends for both Shanghai Composite and HSI (Hang Seng Index) to examine the correlation between two markets determined by different market schemes. Unlike HSI, the Shanghai Composite is still not entirely open to foreign investors due to tight capital account controls enforced by the Mainland authorities. In other worlds, Shanghai Composite is primarily a market index dominated by domestic investors. Then, I will pick companies from Hang Seng China AH Premium Index and graph both their A-Share and H-Share performances to discuss the premium before and after market crash 2015.

Import Packages


In [1]:
import sys                             # system module
import pandas as pd                    # data package
import matplotlib as mpl               # graphics package
import matplotlib.pyplot as plt        # pyplot module
import numpy as np                     # foundation for Pandas 
import datetime as dt 
import html5lib

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# check versions
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())

# Puts plots in notebook 
%matplotlib inline


Python version: 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Plotly version:  1.12.11
Today:  2016-12-22

Creating Datasets: HSI vs. Shanghai Composite in 2015


In [2]:
# Read data from Yahoo Finance through DataReader
from pandas_datareader.data import DataReader
from datetime import datetime

# Track the performance of HSI and Shanghai Composite in year of 2015 
start = dt.datetime(2015, 1, 1)  
end = dt.datetime(2015, 12, 31)

# Put shcomp and hsi data into dataframes for further analysis. 
shcomp = DataReader('000001.SS',  'yahoo', start, end) 
shcomp = shcomp['Close']
shcomp = pd.DataFrame(shcomp)

hsi = DataReader('^HSI',  'yahoo', start, end)
hsi = hsi['Close']
hsi = pd.DataFrame(hsi)

# Pick a specific mutual listing company to discuss the premium of its A share relative to its H share counterpart
PingAn_H = DataReader('2318.HK',  'yahoo', start, end) 
PingAn_H = PingAn_H['Close']
PingAn_H = pd.DataFrame(PingAn_H)

PingAn_A = DataReader('601318.SS',  'yahoo', start, end)
PingAn_A = PingAn_A['Close']
PingAn_A = pd.DataFrame(PingAn_A)

In [3]:
# Reset index for later proper merging of datasets
shcomp = shcomp.reset_index()
hsi = hsi.reset_index()
PingAn_H = PingAn_H.reset_index()
PingAn_A = PingAn_A.reset_index()

In [4]:
combo = pd.merge(hsi, shcomp,     # left and right df's
                 how='inner',     # add to left 
                 on='Date'        # link with this variable/column 
                ) 
combo = combo[['Date','Close_x','Close_y']]
combo.columns = [['Date','HSI','SHCOMP']]
combo.plot(kind ='line',figsize = (8,6), x = 'Date', subplots = True, title = 'Shanghai Composite VS Hang Seng Index')


Out[4]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11ab5ec50>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x119f9bd30>], dtype=object)

In [5]:
# Merge the datasets based on 'Date' using inner method. I want to use intersection of trading days from both frames
combo1 = pd.merge(PingAn_H, PingAn_A,   # left and right df's
                 how='inner',           # intersection of trading days from both Shanghai and Hong Kong 
                 on='Date'              # link with this variable/column 
                ) 
combo1.columns=[['Date','PingAn_H','PingAn_A']]

In [6]:
# Divide PingAn's A share performance by its H share performance to create a new column of AH Index
combo1['AH Index']=combo1['PingAn_A']/combo1['PingAn_H']
combo1.plot(subplots=True,figsize = (8,6), x='Date')


Out[6]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11aae9908>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x107298518>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114fb2828>], dtype=object)

Data Analysis: Daily Returns and General Trend

  • During China's stock crash in summer 2015, Ping An's A Share listing experienced a more drastic loss than its H Share listing, upsetting a continuous AH Index premium.
  • Through statistical analysis below, we can observe that Ping An's A share listing has a higher standard deviation than its H share listing; therefore, 2015 summer market crash serves as a stress test to examine the real valuations of firms listed on China's Mainland exchanges.
  • China's equity market had rallied since March 2015 before the market crashed in July; during this period AH Index also skyrocketed for dual listing companies as their Mainland shares had been trading stronger than Hong Kong listings. In fact, this premium vanished following the stock crash, suggesting a fundamental misevaluation of Mainland's public companies.

In [7]:
# Statistical analysis to display standard deviation of Ping An's A share and H share performance
combo1.describe()


Out[7]:
PingAn_H PingAn_A AH Index
count 261.000000 261.000000 261.000000
mean 55.083142 58.501379 1.067574
std 16.707415 23.606698 0.378483
min 35.550000 25.110000 0.687945
25% 43.400000 34.070000 0.787991
50% 47.675000 68.490000 0.832596
75% 57.450000 80.400000 1.527184
max 93.450000 93.170000 1.884556

Data Analysis: Monthly Returns and Volatility

Resampling the data into monthly returns so we have better distribution analysis of dual listing's return to discuss implied volatility. The companies chosen here are Ping An and Tsingtao Brew, and I am using a 3-year period for this regression


In [8]:
start = dt.datetime(2013, 1, 1)  
end = dt.datetime(2015, 12, 31)    
tickers = ['2318.HK','601318.SS']   
pingan2 = DataReader(tickers, 'yahoo', start, end)
pingan2 = pingan2.to_frame().unstack()['Close']  # unstack the data and choose the "Close" column
pingan2 = pingan2.resample('MS')                 # adopt month start frequency to better show monthly return/volatility                   

pingan2pct = pingan2.pct_change().shift(-1)      # move pct_change result one unit up
pingan2pct = pingan2pct.round(4)*100             # multiply to make it 100% unit  
pingan2pct.plot(kind='line', title='Monthly Return') 
pingan2pct.head(3)

# Apply histogram to show the distribution of Pingan's dual listing returns
pingan2pct.plot(kind='hist',subplots=True, bins = 30,title="histograph for 2318.HK and 601318.SS")


/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:8: FutureWarning:


.resample() is now a deferred operation
You called pct_change(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead

Out[8]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11bc62ef0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11bcd77b8>], dtype=object)

we can see above that PingAn's A share (601318.SS) had a wider distribution of returns than its H share listing (-40% and 40%), suggesting higher volatility


In [9]:
tickers2 = ['0168.HK','600600.SS']   
tsingtao_brew  = DataReader(tickers2, 'yahoo', start, end)
tsingtao_brew = tsingtao_brew.to_frame().unstack()['Close']  # unstack the data and choose the "Close" column
tsingtao_brew = tsingtao_brew.resample('MS')                 # adopt month start frequency to better show monthly return/volatility                   

tsingtao_brewpct = tsingtao_brew.pct_change().shift(-1)      # move pct_change result one unit up
tsingtao_brewpct = tsingtao_brewpct.round(4)*100             # multiply to make it 100% unit  
tsingtao_brewpct.plot(kind='line', title='Monthly Return') 
tsingtao_brewpct.head(3)

# Apply histogram to show the distribution of vanke's dual listing returns
tsingtao_brewpct.plot(kind='hist',subplots=True, bins = 30,title="histograph for 0168.HK and 600600.SS")


/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:6: FutureWarning:


.resample() is now a deferred operation
You called pct_change(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead

Out[9]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11c354940>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11c3b3b00>], dtype=object)

The distribution of Tsingtao Brew's dual listing returns again shows A share (600600.SS)'s greater volatility than H share counterpart.

Creating Datasets: HSI AH Premium Index

  • We can find HSI's official data on AH Premium index here and this is an online pdf file. We can find relevant information regarding this index, such as launch date and constituents.

  • I tried several packages to read data in pdf, but they didn't work. So I manually collected data from HSI's report and uploaded in my personal github's public domain as a CSV file to access for future analysis.

  • The dataset below shows dual listing's industry classification and A/H price ratio and I am interested in exploring which sector tends to trade a higher ratio by grouping the data and computing the average ratio for each sector

data updated and published by HSI in Oct. 2016


In [10]:
# Create the path of url/csv to access
path="https://raw.githubusercontent.com/simonxingyanli/kidscoding/master/hsi_ahpremium.csv"
hsi_ahpremium=pd.read_csv(path)
hsi_ahpremium.head(5)


Out[10]:
Company Name Industry Classification A/H Price Ratio (%)
0 CSCL Industrials 289.54
1 Sinopec SSC Energy 279.62
2 GAC Group Consumer Goods 273.07
3 COMEC Industrials 269.86
4 SH Electric Industrials 268.67

We have a dataframe of all listings within AH Premium Index


In [11]:
# Calculate the average A/H Price Ratio for constituents included in the index
hsi_ahpremium['A/H Price Ratio (%)'].mean()


Out[11]:
158.2509523809524

In [12]:
# Group the data based on industry classification and count the number of companies in each sector
industry_class = hsi_ahpremium[['Industry Classification','A/H Price Ratio (%)']].groupby('Industry Classification')
industry_class.count()


Out[12]:
A/H Price Ratio (%)
Industry Classification
Consumer Goods 8
Consumer Services 4
Energy 7
Financials 17
Industrials 10
Information Technology 1
Materials 6
Properties & Construction 7
Utilities 3

Above chart shows the number of companies categorized in each sector, and this index weighs heavily on traditional sectors like Financials and Industrials and lightly on emmerging industries like Consumer Services and Information Technology.


In [13]:
# Group the data again based on industry for future statistical analysis.
counts = hsi_ahpremium.groupby(['Industry Classification','A/H Price Ratio (%)','Company Name']).count()
counts.head()


Out[13]:
Industry Classification A/H Price Ratio (%) Company Name
Consumer Goods 90.12 Fuyao Glass
113.46 Sh Pharma
114.17 Fosun Pharma
115.07 Tsingtao Brew
126.14 BYD Company

In [14]:
# Summmarize the sum, mean, std, and number of companies for each sector
hsi_summary = industry_class['A/H Price Ratio (%)'].agg([np.sum, np.mean, np.std, len])
hsi_summary


Out[14]:
sum mean std len
Industry Classification
Consumer Goods 1146.38 143.297500 57.195378 8.0
Consumer Services 680.05 170.012500 38.539425 4.0
Energy 1270.11 181.444286 65.103597 7.0
Financials 2030.86 119.462353 14.183785 17.0
Industrials 2007.02 200.702000 61.709290 10.0
Information Technology 164.11 164.110000 NaN 1.0
Materials 1051.50 175.250000 43.152325 6.0
Properties & Construction 1060.88 151.554286 35.986654 7.0
Utilities 558.90 186.300000 24.727032 3.0

On average, Industrials have highest premium followed by Utilities, Materials, and Energy sectors, which are mostly state owned and traditional industries. In terms of standard deviation which measures volatility of the index, Industrials ranks the first again.


In [15]:
# Creating a pie chart for industry classification and a bar chart for mean of each industry
fig, ax = plt.subplots(2, 1)

hsi_summary.plot.pie(ax=ax[0],
                     figsize=(4,8), 
                     y='len', 
                     legend = False,
                     autopct='%1.0f%%')

hsi_summary.plot.barh(ax=ax[1], 
                      y="mean", 
                      color = ['blue','orange'], 
                      legend = False)
ax[0].set_title("Components of HSI AH Premium Index", fontsize = 10)
ax[1].set_ylabel("Mean AH Premium Ratio", fontsize = 10)


Out[15]:
<matplotlib.text.Text at 0x11c8ba438>

The bar chart above displays AH Premium Index's distribution, 27% of listings being Financials and 16% being Industrials while emerging industries like Information Technology only accounting for 2%. China's economic growth is primarily correlated with the performance of SOEs in receipt of government subsidy and special policies.

Conclusions: AH Premium Index & China's Stock Market

  • China's equity market is not as developed as mature markets such as the United States due to lack of derivative products and state intervention on short-selling activities. 80% of the trading population in China are retail investors who are often highly levered and uneducated in stock markets. Afraid of upsetting political and social stability, China Securities Regulatory Commission (CSRC) refuse to de-list public companies that failed to perform for three consecutive quarters. During the stock crash in 2015, it induced bad investing and allowed reckless investments into underperforming companies that were essentially worthless on the books.

  • HSI's AH Premium Index is a great tool to assess the valuation of domestic public companies against its listing in HSI, which allows various futures and options and is a free market open to foreigh hedge funds. Earlier comparison between HSI and Shanghai Composite in 2015 shows two markets possess similar fundamentals as they move together despite Shcomp's higher standard deviation. What causes the premium of standard deviation or return?

  • After analyzing AH Premium Index, I learnt that on average, Industrials have highest premium and standard deviation followed by Utilities, Materials, and Energy sectors, which are mostly state owned and traditional industries. Is it suggesting Chinese SOEs (state-owned enterprises) are overvalued in free market? Prior to the stock crash, state-owned media described a bull market that did not match China's economic growth measured by GDP and PMI, which had been declining, despite an irrational stock market.

  • During the bull market from March 2015 to June 2015, PingAn's AH premium (calculated earlier dividing A share by H share) surged and dropped to previous level after the stock crash, which is applicable for other companies included in the AH Index. Although the HSI AH Premium Index includes primarily traditional SOEs with limited exposure to China's emerging industries, it serves as a unique scorecard of China's economic growth by analyzing the premium of domestic listings over HK listings. The new question is: whether this new normal of stability after the stock crash is sustainable in the long run especially when China decides to open itself to foreign investors and allows more advanced derivatives.

Sources and References